Analysis in geographic and semantic spaces

 

Entry Name:  "FIAIS-Andrienko-MC2"

VAST Challenge 2014
Mini-Challenge 2

 

 

Team Members:

Natalia Andrienko, Fraunhofer IAIS and City University London, natalia.andrienko@iais.fraunhofer.de     PRIMARY

Gennady Andrienko, Fraunhofer IAIS and City University London, gennady.andrienko@iais.fraunhofer.de  

Georg Fuchs, Fraunhofer IAIS, georg.fuchs@iais.fraunhofer.de

 

Student Team:  NO

 

Analytic Tools Used:

V-Analytics (http://geoanalytics.net/V-Analytics; see also the book http://geoanalytics.net/vam), developed in the Knowledge Discovery department of Fraunhofer IAIS

 

Approximately how many hours were spent working on this submission in total?

30 hours for exploration and 24 hours for reporting

 

May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2014 is complete? YES

 

Video:

FIAIS-Andrienko-MC2-video.wmv 

 

 

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Questions

 

MC2.1Describe common daily routines for GAStech employees. What does a day in the life of a typical GAStech employee look like?  Please limit your response to no more than five images and 300 words.

 

Figure 1.1. The map shows repeatedly visited individual and public places extracted from the trajectories. The places have been semantically interpreted and classified based on the visit times and, where appropriate, card transaction data.

 

 

Figure 1.2. The time histograms represent the weekly temporal patterns of visiting different kinds of places, showing the typical locations/activities of the people in different hours of the week.

 

 

Figure 1.3. The place types, called semantic places, have been arranged in a spatial layout called semantic space. The movements between the geographic locations have been semantically abstracted and summarized into flows between semantic places. The widths of the flow symbols are proportional to the total counts of the moves between the respective types of places.

 

 

Figure 1.4. The flows summarizing the trajectories dynamically react to filtering of the trajectories. We have selected the trajectories of the trucks (A), of the cars on the week days (B), and of the cars on the weekend (C), to see the respective typical movements. Note the states of the dynamic query and focuser for the maximal arrow width.

 

 

Figure 1.5. We have computed the flow intensities by hours of the week and clustered the hours by similarity of the flow intensities. The time arranger shows the weekly temporal pattern of the color-coded clusters. The small multiple maps represent the averaged flows corresponding to each time cluster, showing what movements are typical for different time intervals.

 

 

Hence, a typical week day routine is home – breakfast/coffee (between 7 and 8 o’clock) – work – lunch (12-13 o’clock) – work – home (17-18 o’clock) – {dinner, shopping, visits of colleagues} (18-20 o’clock) – home. On the weekend, people go frequently to restaurants, less frequently to shops, and occasionally to sport places or museums. Trucks move in the week days between the work (GasTech), various companies, and airport; sometimes they go to restaurants for lunch.

 

MC2.2Identify up to twelve unusual events or patterns that you see in the data. If you identify more than twelve patterns during your analysis, focus your answer on the patterns you consider to be most important for further investigation to help find the missing staff members. For each pattern or event you identify, describe

a.       What is the pattern or event you observe?

b.      Who is involved?

c.       What locations are involved?

d.      When does the pattern or event take place?

e.      Why is this pattern or event significant?

f.        What is your level of confidence about this pattern or event?  Why?

 

Please limit your answer to no more than twelve images and 1500 words.

 

1) Repeated visits to 5 unknown places by 4 security employees

 

Bodrogi, Ferro, Mies (Minke), and Osvaldo, all having employment type “Security”, repeatedly visited 5 places labeled in Figure 1 as BFMO-1 to BFMO-5. These places do not correspond to any local businesses or homes of any employees. The places were usually visited before lunch.

 

Figure 2.1.  The table view shows when, by whom, and how long each place was visited. There were 5 cases when two or three persons were in the same place simultaneously: Bodrogi + Mies at BFMO-1 on 08/01/2014, Ferro + Osvaldo at BFMO-3 on 10/01/2014; Osvaldo + Bodrogi + Ferro at BFMO-4 on 15/01/2014; Mies + Osvaldo at BFMO-2 on 16/01/2014; Bodrogi + Ferro at BFMO-1 on 17/01/2014.

 

 

Figure 2.2. By filtering through the semantic space map, we have selected only those daily trajectories that include visits to BFMO places. There are 30 such trajectories, which are also shown in a summarized form as a set of flows. The map shows that the visitors typically went to the BFMO places from the work and after that went for lunch, which means that the BFMO places are not lunch places.

 

 

2) Night visits of security employees to colleagues

 

Bodrogi, Mies, Osvaldo, and Isia Vann, all having employment type “Security”, visited their colleagues Campo-Corrente, Strum, Vasco-Pais, and Barranco in the night time. All these visited persons have the employment type “Executive”. Each time there were two visitors, one coming shortly after 23 o’clock and the other shortly after 03:30AM. There were 4 such cases in total. In one case, both visitors stayed until about 07:30. In three other cases, the first visitor left shortly before the second visitor arrived.

 

Figure 2.3. The table view shows all visits to colleagues’ homes selected by filtering through the semantic space map. Azada, evidently, had a party in his home on January 10, when he was visited by many colleagues. Suspicious night visits are highlighted in bold. By comparing Figure 2.3 with Figure 2.2, we see that in the next day after each nigh visit, one of the visitors (the one who came later) appeared at one of the BFMO places.

 

 

3) Night visits to work (GasTech)

 

Figure 2.4. By spatio-temporal filtering, we have selected the visits to the work place (GasTech) in unusual times, i.e., not starting between 6AM and 18PM. Alcazar visited GasTech four times in the night time. Suspicious is also the visit by Truck 104 on January 16 at 20:00 for 15 minutes. On January 13, Truck 101 came unusually late.

 

 

4) Midday visits of two employees to the hotel

 

Figure 2.5.  The table view shows the visits to the hotel selected through spatial filtering. Tempestad and Borrasca met four times in the hotel at about lunch time on January 08, 10, 14, and 17. The remaining visits belong to Sanjorge Jr., who, evidently, stayed in the hotel from January 17 till 19.

 

 

 

5) Overnight stay of Nubarron at Kronos Capitol

 

Figure 2.6. We have selected the visits that lasted for 6 or more hours, such that the visited place was not home, work, or hotel. Besides the already known visits to colleagues and a BFMO place, we see a visit of Nubarron to Kronos Capitol, which lasted almost 24 hours. However, it may mean that Nubarron left his car at the Capitol and moved without it during this time.

 

 

Figure 2.7. The table below shows who else visited Kronos Capitol on January 18. One of the visitors, Bodrogi, is the security employee known for his visits of BFMO places and night visits to colleagues. It may be that the long stay of Nubarron’s car at the Capitol is related to his meeting with Bodrogi.

 

 

6) Two homes of Hennie Osvaldo

 

For Hennie Osvaldo, two places have been classified as home places based on the temporal patterns of the place visits. The place visited slightly more frequently was labeled “home” and the other place “home 2”. “Home” coincides with the home places of Dedos and B. (Birgitta) Frente, and “home 2” coincides with the home places of Bodrogi, Ferro, and I. (Isia) Vann. Note that Bodrogi, Ferro, I.Vann, and Osvaldo are security employees known for their strange activities reported in sections 1 and 2.

 

Figure 2.8. The time histograms show when Osvaldo visited her two homes during the two week period.

 

 

 

Figure 2.9. The semantic space map shows aggregated movements of Osvaldo. It shows that Osvaldo always returns from the work to “home” rather than to “home 2”. From “home”, she goes for lunch/dinner, and then often goes to “home 2”, which may be the home of Osvaldo’s partner. Moves to breakfast/coffee before work occurred more frequently from “home 2” than from “home”.

 

 

7) Strange movements of trucks

 

By the end of the data time span, some trucks made strange movements back and forth along the same routes without stopping: (1) truck 101 driven by Albina Hafon in the afternoon of Monday, January 13; (2) truck 107 driven by Irene Nant in the afternoon of Wednesday, January 15; (3) trucks 104, 105, and 106, driven by Henk Mies, Valeria Morlun, and Dylan Scozzese, respectively, in the afternoon of Thursday, January 16; (4) truck 107 driven by Cecilia Morluniau in the midday of Friday, January 17.

 

Figure 2.10. Screenshots of a space-time cube display showing the strange truck trajectories. The colors red, green, blue, yellow, and magenta correspond to trucks 101, 104, 105, 106, and 107, respectively. Stops are manifested by vertical segments of trajectories. To make the stops easier noticeable, we have extracted the points where the trucks stopped for at least one minute. These points are represented by violet balls. The display clearly shows that the trucks did not stop when repeatedly moving back and forth.

 

 

8) Relationships between people

 

From the trajectories, we have extracted all meetings of the people and excluded the meetings that occurred at work and the meetings of people living together at their homes. From the remaining meetings, we have computed distances between individuals based on the relative frequencies of their meetings.

 

Figure 2.11. The map display shows the space of inter-personal relationships. The 2D projection has been obtained based on the pair-wise distances between the individuals. The dots represent the individuals and are colored according to their employment types. The curved connecting lines represent the strengths of the relationships between the individuals (i.e., the relative meeting frequencies) by proportional widths and opacities. We see a tight group of security employees (the group also includes two non-security persons). Two security persons, Cocinaro and Osvaldo, bridge this group with another tight group made by engineering and information technology employees. Another group of engineers is relatively separated from the latter group and from the security group but has strong links to executive staff.

 

 

Figure 2.12. In the same map display as before, the connecting lines represent the absolute numbers of meetings. We have applied filtering to see the people met by Sanjorge (Jr.) during his stay in Abila from January 17 till 19. Here the maximal number of meetings is 2.

 

 

 

MC2.3Like most datasets, the data you were provided is imperfect, with possible issues such as missing data, conflicting data, data of varying resolutions, outliers, or other kinds of confusing data.  Considering MC2 data is primarily spatiotemporal, describe how you identified and addressed the uncertainties and conflicts inherent in this data to reach your conclusions in questions MC2.1 and MC2.2.  Please limit your response to no more than five images and 300 words.

 

1) Track of E. Orilla

 

Figure 3.1. The track of Elsa Orilla, which is represented by the blue line, is extremely noisy (zigzagged).

A: By comparing the stop positions extracted from the trajectory (orange dots) with the position of GasTech and the positions of the businesses where E.Orilla paid by her credit card (light blue circles), we see that the positions are systematically shifted to the northwest. The positions of the businesses have been earlier determined based on the stop positions of the other employees.

B: We have modified all positions in E.Orilla’s trajectory by subtracting the average deviations of the longitudes and latitudes of the stop positions from the locations where they were supposed to be. Although the new extracted stop positions (orange dota) are scattered, due to the noise, the clusters cover or overlap with the real positions of the places visited by E.Orilla. To deal with the noise when extracting the repeatedly visited places of E.Orilla, we have set a sufficiently large distance threshold (150 m) for the stop point clustering.

 

 

2) Finding meetings of people from noisy data

 

In finding meetings of people, we accounted for the possible positioning errors by giving a sufficiently large threshold (150m) for the spatial distance between positions of different individuals.

 

Figure 3.2. The map and the space-time cube show the meetings of two or more people extracted from the set of trajectories. The meetings are represented by spatial buffers in red; in the STC, the buffers are extended vertically proportionally to the meeting durations. Cyan circles on the map and balls in the STC represent the stop positions. In finding the meetings, we excluded the stop positions at GasTech, where all employees regularly met.

 

 

3) Missing data

 

Large spatial gaps between consecutive position records indicate that many intermediate positions are missing. Such cases manifest themselves on a map as long straight trajectory segments that do not follow any streets. When positions are missing, we cannot determine the durations of staying at the last recorded locations and cannot know where the vehicles were between the recorded positions.

 

Figure 3.3. The map shows the trajectory segments where the distances between the consecutive points exceed 250 m. The dots mark the starting positions of the segments, i.e., the last positions before the gaps. Different colors correspond to different individuals/vehicles: blue to Calzas, pink to E.Orilla, and red to truck 107. For Orilla, there was only one gap between the first record and the remaining part of the trajectory. For Calzas, there were 28 gaps that repeatedly occurred throughout the time period of the data. The position recording was often interrupted at Calzas’s home and some public places, but there were also breaks in other locations. For truck 107, there were 7 gaps that began at either GasTech or Carlyle Chemical Inc.; each time, the next recorded position was at the other place of the two.

 

 

4) Wrong daytimes in card transaction data

 

The daytimes of the card transaction records from the coffee shops “Bean There Done That”, “Brewed Awakenings”, and “Jack''s Magical Beans” are always the same: 12:00:00. Therefore, it was not possible to determine the positions of these coffee shops from the positions of the people who paid by their credit card at the moments of payment. To determine the positions of the coffee shops, we considered the spatial clusters of stops in the places that were not identified yet and compared the lists of the people who stopped with the lists of the coffee shop visitors who paid by credit cards.

 

Figure 3.4. The spatial clusters of stops that have been referred to three coffee shops based on comparing the lists of stopping people and the lists of credit card payments in each day.

 

 

5) Matching trucks to drivers and determining the locations of the businesses visited by the trucks

 

We have extracted stops of trucks from the truck trajectories and compared the times of the stops with the card transaction times of the people who did not use GasTech cars privately at different businesses. This allowed us to determine the probable drivers of the cars (there are the people who paid by cards during the truck stops) and the spatial locations of the businesses that were visited only by trucks (from the spatial positions of the stops). The businesses include Abila Airport, Abila Scrapyard, Carlyle Chemicals inc., Kronos Pipe and Irrigation, Maximum Iron and Steel, Nationwide Refinery, and Stewart and Sons Fabrication.

 

Figure 3.5. The map shows the spatial clusters of truck stops that occurred in places not visited by personal cars. The place names have been determined by matching the times of the stops with the times of card transactions of people who did not use personal cars.